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Abstract 

A simple procedure to map two probability measures in M'^ is 
the so-called Knothe-Rosenblatt rearrangement, which consists in re- 
arranging monotonically the marginal distributions of the last coordi- 
nate, and then the conditional distributions, iteratively. We show that 
this mapping is the limit of solutions to a class of Monge-Kantorovich 
mass transportation problems with quadratic costs, with the weights 
of the coordinates asymptotically dominating one another. This en- 
ables us to design a continuation method for numerically solving the 
optimal transport problem. 

Keywords: optimal transport, rearrangement of vector-valued maps, 
Knothe-Rosenblatt transport, continuation methods. 

1 Introduction 

Given two Borel probability measures fi and u on M'^, a Borel map 5* : 
M"' M°' is said to be a transport map between fi and u if S'jj/i = i> 
where S'H/i denotes the push-forward (or image measure) of fi through u 
(i.e. S^fi{B) = fi{S~^{B)) for every Borel B). In the present article, we will 
focus on two particular transport maps: the Knothe-Rosenblatt transport 
and the Brenier's map. 
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The Knothe-Rosenblatt transport. The Knothe-Rosenblatt rear- 
rangement was independently proposed by Rosenblatt [U] for statistical pur- 
poses and by Knothe [1] in order to extend the Brunn- Minkowski inequal- 
ities. The principle is the following, as explained in Villani [8]. Let fi 
and I' be two Borel probability measures on M*^ and assume for simplic- 
ity for the moment that /i is absolutely continuous with respect to the 
Lebesgue measure. Let us denote by /i'^ (respectively u'^) the d-th marginal 
of /i (respectively u) and /^Jt,^^,.,),- • • , /^(x,,...,x2) (respectively 

uf'"^ y}, , \) the successive disintegrations (or conditional mea- 

sures) of /i (respectively z/) given Xd, {xd,Xd-i),---, (x^, ...,X2) (respectively 
given yd, {yd,yd-i), (l/d, 1/2))- Now let = Td{xd) be the monotone 
nondecreasing map transporting /i*^ to u'^, such a map is well-defined and 
unique as soon as /id has no atoms and in this case, it is explicitly given 
by Td = G^^ o Fd (with Fd{a) := /i((-oo,a]) and Gd{a) := z/((-oo, a])). 
Then let Td^i = Td^i{xd-i,Xd) be such that Td{.,Xd) is monotone and maps 
to i^Taixd)' repeats the construction (well-known by statisticians 

under the name of conditional quantile transforms) iteratively until we de- 
fine Ti{xi,X2, ■■■,Xd), which is monotone in Xi and transports /i^^^ onto 
^T2ix2 xa)- FiiiciUy, the Knothe-Rosenblatt rearrangement T is defined by 
T(x) = (Ti{xi,X2, ...,Xd), ...,Td-i{xd-i,Xd),Td{xd))- Obviously, T is a trans- 
port map from fi to z/, i.e. Tjj/i = u. By construction, the Knothe transport 
T has a triangular Jacobian matrix with nonnegative entries on its diago- 
nal. Note also that the computation of the Knothe transport only involves 
one-dimensional monotone rearrangements and that it is well defined as soon 
the measures one transports have no atoms. The precise assumption is the 
following. 

Assumption (H-source): the measure /i*^, as well as /i"^— almost all the 
measures and the measures for /i'^— a.e. Xd and a.e. 

Xd-i---VLp to almost all the measures .j.^, which are all measures on 
the real line, must have no atoms. 

Notice that (H-source) is satisfied as soon as fi is absolutely continuous 
with respect to the Lebesgue measure. 

The Monge-Kantorovich problem and the Brenier's map. Op- 
timal transportation theory provides an alternative way to transport fi to 
u. We recall that in the case of the quadratic cost, the Monge-Kantorovich 
problem reads as 

inf / \x — y\'^dir{x,y) (1.1) 

where r(/i, u) denotes the set of transport plans between fi and u i.e. the 
set of probability measure on x with marginals fi and u. We refer to 
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the books of Villani [7j, [H] for a modern account of optimal transportation 
theory. The hnear problem (jl.ip is a relaxation of the Monge problem 



inf / 



X — S{x)\'^dn{x) 



(1.2) 



When /i is absolutely continuous with respect to the Lebesgue measure, Bre- 
nier [2] proved that f 1 1.2 1) has a unique solution which is characterized by the 
fact that it is the gradient of some convex function. More precisely, there 
exists a unique (up to constants and /i-a.e. equivalence) convex function 
V : M"' ^ R such that VV^fi = v. Also {id x Vy)ti/i is the only solution of 
(11.11) and W is called the Brenier's map between ^ and v. 

The Knothe-Rosenblatt as a limit of optimal transportation 
plans. Let us slightly modify the quadratic cost in (II. ip and replace it 
with the weighted quadratic cost 



where the Ai(£)'s are positive scalars depending on a parameter £ > 0. If /i is 
absolutely continuous with respect to the Lebesgue measure, the correspond- 
ing optimal transportation problem admits a unique solution T^. When in 
addition, for all k G {1, ...,d — 1}, \k{s) / \k+i{e) — > as e ^ 0, it is natural 
to expect the convergence of to the Knothe transport T. We will show 
that this convergence holds provided u satisfies some additional condition, 
and namely 

Assumption (H-target): the measure u'^, as well as i/*^— almost all the 
measures I'f"^, and the measures uf'^, for z/'^— a.e. Vd and iyf~^—a..e. 

yd ' ydiiid—\ ^ yd 

Dd-x- . .up to almost all the measures i^^^^ ,^3, which are all measures on the 
real line, must have no atoms. 

Notice that (H-target) is not natural as (H-source) is. Yet, we will show 
a counter-example to the convergence result when it is not satisfied. (H- 
target) as well is satisfied should u be absolutely continuous (actually, this 
assumption is slightly weaker then (H-source), since the last disintegration 
measures are not concerned). 

This convergence result was conjectured by Y. Brenier as a very natural 
one, and actually its proof is not hard. Yet, it was not known before that 
extra assumptions on v were needed. This makes one of the point of interest 
of this paper. 

The other point is what we investigate later in the paper, i.e. the other 
direction: from Knothe to Brenier. We will study the dependence e ^ T^hy 
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means of the evolution with respect to e of the dual variables. This will enable 
us, to design a numericaly strategy to approximate all the optimal transports 
Te taking as initial condition the (cheap to compute) Knothe transport T. 

An example. To illustrate the problem in a particular case where ex- 
plicit solutions are available, take d = 2, and /i and u two Gaussian measures 

where /i = iV (0, h) and i/ = iV ^0, (^^ Take Ai (e) = e and A2 (e) = 1. 

Then it can be verified that is linear, and that its matrix in the canonical 
basis of is 



1 f ae + \J ac — W- 



^Jae^ + c + 2e^ac - V be c + eVac-b^^ 

which converges as e — to T = ^V*^ ^v^) ' ^^^^^ precisely 

the matrix of the Knothe transport from fi to u. 

Organization of the paper. In section [2], we show, under suitable 
assumptions, that the optimal transportation maps for the cost converge to 
Knothe's transport map as the parameter e goes to 0, we will also emphasize 
that some conditions are to be imposed on u for the convergence to hold. In 
section [3], we show that the evolution of the dual variables in the optimal 
transportation problem for cost the q is given by a well-posed ordinary 
differential equation. Finally in section HI we discretize this equation and 
give several numerical results. 



2 Knothe transport as a limit of quadratic 
optimal transports 

We directly state our first result, whose proof, in the spirit of F— convergence 
developments (see [1]), will require several steps. 

Theorem 2.1. Let fi and v he two probability measures on R*^ satisfying (H- 
source) and (H-target), respectively, with finite second moments, and'-fg be an 
optimal transport plan for the costs Ce{x, y) = Yl'i=i ~ ViY j for some 

weights Afe(e) > 0. Suppose that for all k E {1, ...,d — 1} , \k{e) / \k+i{e) 
as e — ^ 0. Let T be the Knothe-Rosenblatt map between fi and v and 'Jk ^ 
P(]R'^ X M'^) the associated transport plan (i.e. '-fx '■= {'id x T)^fi). Then 
^ IK as e ^ 0. 

Moreover, should the plans 7^ be induced by transport maps T^, then these 
maps would converge to T in L'^{fi) as e ^ 0. 
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Proof. Take the plans 7^ that are optimal for the Brenier-like cost given 
by 

d 

i=l 

(we suppose for simplicity Xd{s) = 1 and \i{e) / \i^i{e) 0). Suppose (which 
is possible, up to subsequences) 7^ ^ 7. We want to prove 7 = '-)k- 
By comparing 7^ to 7x and using optimality we first get 

Csd-^e < I Csd'^K (2.1) 



and, passing to the limit as e 0, since q converges locally uniformly to 

cW(x,?/) = {xd - VdY, we get 

c(^)c/7 < / c^'^^d-fK. 



Yet, the function c^^^ only depends on the variables Xd and yd and this shows 
that the measure {7id)i1 gets a better result than {7id)^'yK with respect to 
the quadratic cost (vr^ being the projection onto the last coordinates, i.e. 
7id{x,y) = {xd,yd))- Yet, the measure •jk has been chosen on purpose to 
get optimality from fid to Ud with respect to this cost, and the two measures 
share the same marginals. Moreover, thanks to the assumptions on Hd, this 
optimal transport plan (which is actually induced by a transport map) is 
unique. This implies {Tid)},! = {'^d)ilK- Let us call 7*^ this common measure. 

We go back to (12.11) and go on by noticing that all the measures 7^ have 
the same marginals as '-/k and hence their (separate) projection onto Xd and 
yd are fid and Ud, respectively. This implies that {nd)^'j£ must realize a result 
which is worse than {7rd)0K as far as the quadratic cost is concerned and 
consequently we have 

/d—l „ 
\xd - yd\^d{'Kd)^lK{xd, yd) + ^ j {xi - yifd'-^e 
i=i ^ 



< Csd'Je < / Ce d'jK 



d-l 



\xd - yd?d{'Kd)^'^K{xd, Vd) + ^ i {xi - yifd'^K, 

i=i ^ 

which implies, by simplifying the common term in d{nd)^'~fK, dividing by 
Xd-i{e) and passing to the limit, 

Z^'-'^dj < [ C^'-'U^K 
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(we use the general notation c^{x, y) = \xk — yfcP)- We can notice that both 
integrals depend on the variables Xd-i and y^-i only. Anyway, we can project 
onto the variables {xd-i,Xd) and {yd-i,yd) (obtaining measures {TXd-i)il and 
(7rrf_i)jj7A') so that we disintegrate with respect to the measure 7*^. We have 



d-f'^{xd,yd) j \xd^i - yd~i\'^d-f'l^^]y^-^{xd-i,yd~i) 
< I dY{xd,yd) / \xd-i-yd-i\^dY^-]y^)^Ki^d-i,yd-i). (2.2) 



It is is sufficient to prove that the measures 7(1^^^^) share the same marginals 
on Xd-i and yd-i as the corresponding I'j^^ay^) k S^t that their quadratic 
performance should be worse than the corresponding performance of 7!^"^ x 
(this is because the Knothe measure has been chosen exactly with the inten- 
tion of being quadratically optimal on {xd-i, yd-i) once Xd and yd are fixed). 
Yet, fl2.2p shows that, on average, the result given by the those measures is 
not worse than the results of the optimal ones. Thus, the two results coincide 
for almost any pair (x^, yd) and, by uniqueness of the optimal transports (this 
relies on the assumptions on the measures utz^), we get 7^"^ x = 7f~\ n r-- 

" V^diyd) V^diyd) 1^ 

To let this proof work it is sufficient to prove that the projections of the two 
measures coincide for 7^^— a.e. pair {xd,yd)- For fixed {xd,yd) "we would like 
to prove, for any 

.4-1 



(and to prove an analogous equality for functions of yd-i)- Since we accept 
to prove it for a.e. pair (x^, yd), it is sufficient to prove this equality: 

d-f'^{xd,yd)'ipixd,yd) J (l){xd-i)d-f'l~^]y^^ 

dY{xd,yd)i^{xd,yd) j <^(^d-i)^7(t"2y^)_j^ 

for any and any ip. This means proving 

^{xd,yd)(p{xd-i)dY~'^ = j ^{xd,yd)(p{xd-i)d'jp^ , 

which is not trivial since we only know that the two measures 7"^"^ and 7^^ 
have the same marginals with respect to the pairs (xd-i.Xd), (z/d-i, yd) (since 
they have the same projections onto x and onto y) and {xd, yd) (since we just 
proved it). But here there is a function of the three variables {xd-i,Xd,yd)- 
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Yet, we know that the measure ■y'^ is concentrated on the set i/d = Td{xd) for 
a certain map T^, and this allows to replace the expression of yd, thus getting 
rid of one variable. This proves that the function ipi^d, yd)<P{xd-i) is actually 
a function of {xd-i,Xd) only, and that equality holds when passing from 7 
to 7x-The same can be performed on functions ipi^d, yd)<P{yd-i) but we have 
in this case to ensure that we can replace Xd with a function of yd, i.e. that 
we can invert T^. This is possible thanks to the assumption on i>d, since Td 
is the optimal transport from fid to Ud, but an optimal transport exists in 
the other direction as well and it gives the same optimal plan (thanks to 
uniqueness). These facts prove that the measures if^^y^) and I'^^avd) k h^^^ 
the same marginals and hence, since they are both optimal, they coincide for 
a.e. pair {xd,yd)- This implies = 7^^. 

Now, it is possible to go on by steps: once we have proven that 7^ = 7]^, 
let us take (12. ip and estimate all the terms with \xi — and i > h thanks 
to the optimality of 7^, thus getting 



//i— 1 « 
-2/iprf7;^ + ^Ai(e) / 

i^n i=l 



Xi - yifdje 



< J Csd-fe < J Cs d-fK 

/h—1 „ 
\xi - yi\^d-fK + '^Xi{e) / 

i^n 1=1 



{xi - yifd-f 



K, 



and consequently, by dividing by \h-i{s) and passing to the limit, 

c('^-i)rf7 < / c('*-i)rf7x. 



We disintegrate with respect to 7^^ and we act exacly as before: proving that 
the marginals of the disintegrations coincide is sufficient to prove equality of 
the measures. Here we will use test-functions fo the form 

i){xh, Xh+i, ...,Xd, yh, Vh+i, yd)<P{xh~i) 

and 

i){xh, Xh+i, ...,Xd, yh, Vh+i, yd)(i>{yh-i)- 

The same trick as before, i.e. replacing the variables y with functions of 
the variables x is again possible. To invert the trick and replace x with y 
one needs to invert part of Knothe's transport. This is possible since our 
assumptions imply that all the monotone transports we get are invertible. 
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In the end we get, as before, ^ = Ik ^- '^^^^ procedure may go on up to 
h = 2, thus arriving at 7 = 7;^:. 

We have now proven 7^ ^ 7^^. Yet, if all these transport plans come 
from transport maps, it is well known that (T^ x Z(i)(t/i ^ (T x implies 
Te ^ T in -L^(yu), for any p > 1, as far as Tg is bounded in U'{^). Actually, 
weak convergence is a simple consequence of boundedness: to go on, we can 
look at Young's measures. The assumption (the limit is a transport map 
as well) exactly means that all the Young measures are dirac masses, which 
implies strong convergence. In particular we get convergence and /i-a.e. 

convergence on a subsequence. □ 

Let us remark here that if instead of considering the quadratic cost Cg, 
one considers the more general separable cost 

d 

Ce{.x, y) := ^ Xi{e)ci{xi - yi) 
1=1 

where each q is a smooth strictly convex function (with suitable growth), 
then the previous convergence proof carries over. 

A counterexample when the measures have atoms We now show 
that interestingly, and perhaps counterintuitively, the hypothesis of absence 
of atoms in theorem 12. Ih s necessary not only for n, but also for u. We propose 
a very simple example in where /i is absolutely continuous with respect to 
the Lebesgue measure but u does not satisfy (H-target), and we show that the 
conclusion of theorem 12.11 fails to hold. On the square Q := [—1, 1] x [—1, 1], 
define fi such that fi{dx) = l{xiX2<o}dx/2 so that the measure is uniformly 
spread on the upper left and the lower right quadrants, and v = being 
S the segment [—1, 1] x {0}. 

The Knothe-Rosenblatt map is easily computed as {yi,y2) = T{x) := 
(2(xi + sgn{x2)),0). The solution of any symmetric transportation problem 
with = (e, 1) is (^1,2/2) = T°(x) := (a;i,0) (no transport may do better 
than this one, which projects on the support of i>). Therefore, in this example 
the optimal transportation maps fail to tend to the Knothe-Rosenblatt map. 
The reason is the atom in the measure u"^ = 60. 

Convergence even with atoms The convergence result of theorem 12.11 
requires the absence of atoms in the projections of u. This is obviously not 
the case if u itself is purely atomic! Yet, this will precisely be the case we 
will consider in the algorithm we propose in the sequel. The same proof may 
be extended to this case under the following assumption. Keep the same 
assumptions on fi but suppose that u is concentrated on a set S with the 
property 

y,z e S, yj^z^ydj^Zd. 
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This means that, if we restrict ourselves to S, then all the variables i/i for 
i < d are actually a function of the last variable yd- This is particularly 
useful when u is purely atomic, concentrated on a finite (or countable) set of 
points with different yd components. 

Just come back to the proof. The equality 

j dY{xd,yd)ip{xd,yd) J (l){xd-i)d-i'l-ly^^ 
= j dY{xd,yd)i^{xd,yd) j (t){xd-i)di'l;^ly^)^j^ 

only relied on yd being a function of x^, which is still true. The other equality, 
namely 

j dY{xd,yd)^ixd,yd) j 'Piyd~i)dl'(~^]y^) 
= j dj'^{xd,yd)i^{xd,yd) j 0(l/rf-i)c^7(t~L),x 

gives some extra troubles. It is not any more true that Xd is a function of 
yd- Yet, it is true that yd-i is a function of yd and this allows us to reduce 
the expression to functions of {xd, yd) only, which is sufficient to get equality. 
The same procedure may be performed at subsequent steps as well. 



3 An ODE for the dual variables 

In this section, we consider for simplicity the case d = 2 (although our method 
extends to higher dimensions), /i uniform on some convex polyhedron Q (for 
the sake of simplicity we will assume = 1) and = Y2iLi ^Vi where all 

(2) 

the points yi & Vt have a different second coordinate yi . For every e > 0, 
let be the diagonal 2x2 matrix with diagonal entries (e, 1) and let Cg be 
the quadratic cost defined by Ce(x, y) = As{x — y){x — y). We are interested 
in solving the family of optimal transportation problems 

inf / Ce{x,y)dv:{x,y) (3.1) 

for all values of the parameter e G [0, 1]. It is well-known, that (13. ip can be 
conveniently solved by the dual problem formulated in terms of prices: 

1 ^ f 

snp<^{p,e):=—'^Pi+ p*{x)dx, (3.2) 
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where = minj{c£(a;, yi) — Pi} and we impose as a normalization pi = 0. 

For each e, there is a unique maximizer p{e). For each {p,e) we define 
C(p, = {x G : inf j C£(x, — = Cs{x,yi) — Pi}- It is easy to check 
that $e := is concave differentiable and that the gradient of $e is 

given by 

|f(p)4-|Cfe.),|. 

By concavity of $e, the solution p{e) of (13.21) is characterized by the equation 
V$e(p(£:)) = 0. The optimal transportation between fi and u for the cost Cg 
is then the piecewise map taking value yi in the cell C{p{e),e))i. Our aim 
is to charcacterize the evolution of p{e) as e varies. Formally, differentiating 
the optimality condition V^{p{e),e) = 0, we obtain a differential equation 
for the evolution of p{e): 

^V,$(p(e), e) + Dl^^p{e),e) ■ ^(5) = 0. (3.3) 

Our aim now is to show that the equation (13. 3p is well-posed starting with the 
initial condition p[0) (corresponding to horizontal cells of area 1/N); this will 
involve computing the second derivatives of $ in (13.31) . proving their Lipschitz 
behavior as well as obtaining a negative bound on the larger eigenvalue of 
the negative semidefinite matrix -Dpp$ . 

The price vector p{e), along the evolution, will always be such that all 
the areas \C{p,e)i\ are equal (and are equal to l/N). Yet, we need to prove 
that the differential equation is well posed and we will set it in an open set, 

0= S^{p,e) : ^ < \C{p,e)i\ < ^ for every z| . (3.4) 

The initial datum of the equation will be such that |C(p(0), 0)j| = and 
we will look at the solution only up to the first moment where it exits O. 
Yet inside the set it will be well-posed and it will imply conservation of the 
areas. Hence we will never exit O. 

All the quantities we are interested in depend on the position of the 
vertices of the cells C{p,€)i, which are all polygons. Let us call x{p,€)fj the 
two extremal points of the common boundary between C{p, e)i and C{p, e)j, 
that we call D{e, p)ij (if such a common boundary exists; should it be a single 
point we consider the two points as coinciding). Each one of this points is 
obtained as the intersection of at least this common boundary with another 
one, or with the boundary of fl (which is supposed to be a polygon as well, 
so that in the neighbourhood of almost any point locally the boundary is a 
line). We want to investigate the dependence of these points with respect 
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to {p, e) and prove that this dependence is Lipschitz. Notice that each point 
x{p,£)^j is not defined for any value of (p, e) but only on a certain (closed) 
subset of the space x (0, 1). 

Lemma 3.1. The positions of the vertices x{p,e)fj depend in a Lipschitz 
way on p and e. 

Proof. Locally it is true that the same point x{p, e)fj is defined either by a 
system of equations 

Aix - y,i){x - yi) -Pi = A,{x - yj){x - yj) - pj, 
- yi){x - yi) -Pi = A^{x - yh){x - yu) - Ph 

in the case of a point on the intersection of two common boundaries, or by a 
system 

Ae{x - yi){x - yi) -Pi = A,{x - yj){x - yj) - pj, 
Lx = Iq 

in the case of intersection with the boundary dQ (locally being given by the 
equation Lx = Iq) . The first system, after simplifying, reads as 



2Ae{yj - yi)ix) = Ae{yj){yj) - Ae{yi){yi) - pj + Pi, 
2Ae{yh - yi){x) = Ae{yh){yh) - Ae{yi){yi) -ph+Pi, 



and the second as well may be simplified the same way. This means that 
they are of the form 

M{e)x = V{e,p) 
for a matrix M{e) which reads, in usual coordinates, 

M{e)={ \ \J %^ \. in the first case and 



^' - 


V?) 


(2) 

v) - 

(2) 

Vh - 


(2) 
(2) 

Vl 






(2) 


(2) 



M{e)=\ ^^^i ^» ' in the second. 

The vector V{e^p) is obtained regarding the right hand sides of the system, 
as in (13.71) . Both M and V depend Lipschitzly on (£,]?), with uniformly 
bounded Lipschitz constants. Hence, to check that the dependence of x on 
(e,p) is Lipschitz, we only need to bound (away from zero) detM(£:). The 
determinant of a 2 x 2 matrix is given by the product of the modulus of the 
two vector composing its lines, times the sinus of the angle between them. 
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In the first case the vectors are A^{yj — Ui) and A^ijjh — Hi), while in the 
second they are A^{yj — yi) and / = (/i, 12). In both cases they are the normal 
vectors to the sides of the cell we are considering. This implies, thanks to 
the lower bound on the areas of the cells, that the angles between these 
vectors may not be too small. Actually, since Vt is bounded and the cells 
are convex, the area of each cell is smaller than (diamf2)^ sin a/2, a being 
any angle between two neighbour sides. This implies a lower bound on sin a. 
The lower bound on the moduli comes from the fact that the vectors / are 

chosen with modulus one and the modulus of A^iyj — yi) is always greater 

(2) (2) 

than its vertical component, which is y^ — yj. , which is supposed different 
from zero for any pair Notice that in the first case the matrix has e 

in the determinant, even if we proved a lower bound on such a determinant, 
independent on e: this agrees with the fact that actually, this kind of crossing 
(between two common boundaries of two cells) will only happen for e > Eq 
(for e < Eq we only have almost horizontal strips crossing non-horizontal 
sides of □ 

Lemma 3.2. The function $ admits pure second derivatives with respect to 
p and mixed second derivatives with respect to p and e, and these second 
derivatives are Lipschitz continuous. 

Proof. We have proven that the positions of the points x{p, e)fj depend Lip- 
schitzly, with uniformly bounded Lipschitz constants, on p and e. Since the 
volumes of the cells C{p,e)i are Lipschitz functions of these points, this im- 
plies that Vp$ is C^'^. Hence it admits derivatives almost everywhere and 
we can compute them in the following way. 

The derivative of a volume of a polygonal cell is given, side by side, by 
the length of the side times the average of the components which are normal 
to such a side of the two derivatives x{p,e)^j and x{p,e)~.j (the other terms 
- which are mainly the terms at the corners - are of higher order). The 
equation of the side D{e,p)i,j is, as we know. 



Let us start from the derivatives of the cell C{p, e)i with respect to a variable 
Pj with j 7^ i. We differentiate (13. 8p with respect to pj and we get 



2^e(2/j - yi){x) = A,{yj){yj) - A,{yi){yi) -pj+Pi 



(3.8) 



and the normal unit vector to the side is 



n = 



AeiVj-yi) 
As{yj-yi)\' 



'2Ae{yj - yi){x) 



1. 
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This formula only works for x = x{p,e)fj and x = x{p,e)~j. Obviously it 
ony works where they are different iable, i.e. almost everywhere. Hence the 
derivative, by summing up and rescaling the normal vector, is given by 



d\C{p,e)i\ _ kj 



dpj 2\A^{yj-yi 



(3.9) 



where lij is the length of D{e,p)ij. 

As far as the derivative with respect to Pi is concerned, it is not difficult 
to check that we have (by summing up the results on every side) 

d\C{p,e)i\ ^ ^ hj 

QVr ^2|A,(y,-y,)|' ^'-'"^ 

where the sum is performed on all the indices j such that the cell C(p, e)j is 
in contact with the cell C{p,e)i. 

Automatically, since these derivatives only depend on the values of kj, 
which depend in a Lipschitz manner on the positions of x = x{p,e)fj, they 
are Lipschitz as well. This proves that $£ is actually C^'^ and that these 
derivatives are well defined and admit the previous expressions (I3.9p - fl3.10p 
everywhere. 

The computation of the derivatives with respect to e is a bit trickier. We 
derive again (13. 8p . but with respect to e. Since dA^/de = B, we get 

2A{y^ - y^){x) = -2B{y, - y,){x) + B{y,){y,) - B{y,){y,) 

= 2B{y,-y,) f - x 



Then we renormalize the normal vector, sum up the results for x = x{p, e)^^ 
and X = x{p, e)~^, multiply by the lengths and sum up the results for all the 
sides, and get 

d\C{p,e)i\ _ B{yj - yi){yj + yj - x{p, £)+■ - x{p, e)r.) 

In this well the result is Lipschitz in [p, e) and hence Vp$ is differ- 

entiable everywhere with respect to e, with Lipschitz derivative. 

□ 

We can come now back to the evolution of p = p{e) and consider again 
the differential equation (13.30 . To solve this equation we need to prove that 
the matrix -Dpp$ is actually invertible (for numerical purpose, we will also 
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need to bound its eigenvalues away from zero). It is important to recall that 
we look at the evolution of the vector p = {p2, . . . ,Pn), since we may assume 
Pi{e) = for all e. Hence, we will not look at the entries 1 in the vectors 
or the matrices. The matrix we consider is M := ~{Dpp^)ij=2,...,N has the 
following properties: 

• on each line, outside the diagonal we have negative terms Mj j = 

~2\A,{y,-y,)\^ 

• each element on the diagonal is the sum of minus all the others on the 
same line (hence it is positive), and possibly of the term which should 
be in the same line at the first column; 

• an entry of the matrix is non-zero if and only if the cells C{p,e)i 
and C{p,e)j share a common boundary with positive length; 

• in particular, for any pair (i, j), even if the entry at place is zero, 
it is possible to find a path i = i^, ii, i2, ■ ■ ■ ,ik = j so that the matrix 
has non-zero values at all the positions {ih,ih+i)', 

• the total of the entries of the first column (the one which is not present 
in the matrix) is strictly positive. 

The invertibility of M is ensured by the following: 
Lemma 3.3. Let the matrix M satisfy the following properties 

(HI) for all i, Mi^i > ^ \Mij\, 
(if 2) there exists i such that Mi^i > 

(i73) for any pair {i,j) there is a sequence io, ii, 22, • • • ,4 
with ii = i, ik= j, and Mi^^i^^^ ^ 0. 

then M is invertible. 

Proof. Let x G Ker(M) and let i be an index such that \xi\ is maximal. We 
may suppose for simplicity that xj is positive. Then we have 

= M..XJ - J2 Mi,Xj > Nk-x- - J2 Mj.xj =xJmj.-Y, M^j j > 0- 
j j \ j / 
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This implies that all inequalities are equalities and in particular xj = xj 
whenever M^^ ^ 0. Hence, the entries of x on all the indices which are 
"neighbours" of i equal xj (and they are maximal as well). This allows to 
repeat the argument replacing i with another maximizing index j and so on... 
since any index is connected by a chain of neighbours to i, we get that all the 
entries are equal. But this implies that the vector in the kernel we selected 
must be a multiple of the vector (1, 1, . . . , 1). Yet, this vector is not in the 
kernel since the sum of the elements on each line is not zero for all lines, by 
assumption {H2). This proves that M is invertible. □ 

Finding a lower bound for the modulus of the eigenvalues of M, i.e. 
quantifying its inversibility is not straightforward. Indeed, the properties 
(HI), {H2), {H3) are not sufficient to get this bound, even if we fix the 
norm of the remainding column as the following counter-example shows. The 
determinant of the matrices 

I I -e 

M,= \ -e 1 -{l-e) 
\ -{l-e) 1 

is 2e{l — e) — > 0, which implies that some eigenvalue as well goes to 0. 

We will obtain a positive lower bound by a compactness argument, but we 
will use something stronger than simply assumptions (-f^l), {H2), {H3). The 
idea is that assumption {H3) is not closed, but it stays closed when we replace 
it with the stronger condition of the matrix being associated to a partition 
(as is the case for M = —D^p^). In this case if one connection degenerates 
(i.e. a common boundary reduces to a point), some other connections will 
play the role. 

Lemma 3.4. There is a positive uniform lower bound on the least eigen- 
value of any matrix M associated to the cell partition corresponding to a pair 

Proof. The proof will be obtained by contradiction. To this aim, take a 
sequence of partitions of fl into sets {fi'^)i=i,...,N- We assume these sets to be 
convex polygons with a bounded number of sides. This is the case for the cells 
associated to pairs {p,e). We also know that their areas are always bounded 
between 1/2N and 2/N). These partitions give rise to a certain topology 
of connections between the cells. Up to subsequences, we may suppose that 
this topology is always the same on all the partitions of the sequence (since 
the number of possible topologies is finite). Up to subsequences, we also 
have convergence in the Hausdorff distance. This means that for any i we 
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have fi" ^li, and this convergence, which is the same as the convergence 
of all the vertices, preserves the areas, the convexity, the upper bound on 
the number of sides, the fact of being a partition... The matrices associated 
to these partitions depend continuously on these sets (with respect to this 
convergence, since they actually depend on the positions of the vertices). 
Notice that it is possible that a side reduces its length along the sequence up 
to becoming a single point in the limit. Yet, two cells which share a boundary 
along the sequence will do it along the whole sequence (thanks to our choice 
of not changing the topology) and at the limit, either they share a side as 
well, or they share a point only, but in this case the terms kj converged to 
zero. Hence we can associate to all the partitions their matrices and this 
correspondence is continuous. We have a sequence of matrices M„ — *■ M and 
let us suppose that some eigenvalue A["^ goes to zero. This would imply that 
the matrix M is associated to a partition but has a zero eigenvalue. This 
is not possible, thanks to the proof of lemma 13.31 above. M is associated to 
a partition and hence satisfies assumptions (HI) and {H3). To check {H2) 
we observe that the column that we remove, the first one, is associated to 
the first cell and, up to some rescaling but bounded factor |Ae(yi — yj)\, its 
entries are the lengths of its sides. Yet, this cell conserves the area bounds 
we had on the sequence and its entry cannot be all zero. □ 

From the previous results on the form and the regularity of the deriva- 
tives of Vp$, we deduce from the Cauchy-Lipschitz Theorem that the ODE 
(13.31) governing the evolution of the dual variables is well posed and actually 
characterizes the optimal prices: 

Theorem 3.5. Let p{e) be the solution of the dual problem ^3. 2\) (recall the 
normalization pi{e) = 0), then it is the only solution of the ODE: 

f^ie) = -Dl,Hp{e),e)-' (J-V ,Mpie) , e)^ (3.12) 

with initial condition p{0) such that all the horizontal strips C(p(0),0)i have 
area 1/N. 

4 Numerical results 
4.1 Algorithm 

The algorithm we propose consists simply in discretizing (13.121) (together 
with the initial condition p{0) determined as in Theorem 13.51) by an explicit 
Euler scheme. Let n be some positive integer and h := be the step size. 
Let us set po = p{0) and define prices inductively as follows. 
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• While (pfc, kh) belongs to the open set O defined by fl3.4l) . compute: 



Ak := -Dlp(l>{pk; kh), 6^ := —Vp<!>{pk,kh). 

Note that computing Ak and Sk by formulas fl3.9l) - fl3.10l) and (13.111) . 
requires to construct the cells C{kh,pk)i. 

• Solve the linear system A^z = 5^; by taking advantage of A^ being pos- 
itive definite, we use the conjugate gradient algorithm for minimizing 
Jk{z) = Ak{z){z) — 26k ■ z to solve this system exactly in — 1 steps. 
We denote by Zk the solution. 

• Update the prices by setting 

Pk+i =Pk + hzk- 

Thanks to the Lipschitz properties established in section El it is easy to 
check that for h small enough, {p^, kh) always remain in O and then p^ is 
well-defined for every k np to n. For such an h and since f l3.12p is generated 
by a Lipschitz function on O, it is well-known that the convergence of the 
Euler scheme is linear (see for instance [3]). Denoting by p^ the piecewise 
constant function having values pk on intervals [kh, {k + l)h), we thus get 
the following convergence: 

Theorem 4.1. For h small enough, the algorithm above is well-defined and 
the uniform error between p^ and the optimal price p is 0{h). 

4.2 Numerical experiments 

The construction of the cells at each step is achieved efficiently by an imple- 
mentation in Matlab. In the setting described above where d = 2,Vt = if' , 
II is the uniform distribution on f2, z/ = J2k=i ^Vk^ algorithm computes 
the cells n| = {x G : (x) = Uk} as well as the prices p\ of the cell Uk for 
£ = to e = 1. For e = which is the case where the transportation plan 
is the Knothe one, the computational task amounts to sorting the second 
component of the ?/fc's. Appropriate discretization steps are then chosen for 
the transition £ = to e = 1. At each step, the tesselation of Vt into the poly- 
hedral cells f2| is computed based on the prices Adjacency information 
on these cells is computed, as well as the length of the facet between two cells 
and the coordinates of its extreme points. This information allows one to 
formulate a discretized version of ODE (I3.12p using an Euler discretization 
scheme. 
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For geometric computations we use the Multi-Parametric Toolbox library, 
available online at littp://control.ee.etliz.cli/ mpt. In particular, polytope 



computes the tesselation of ft into the polyhedral cells (A slighlty modi- 
fied version of) mpt_buildAdjacency extracts adjacency information on this 
tesselation, from which the vertices and the lengths of the sides between 
two cells can be deduced. The library also incorporates convenient graphical 
routines. 

Error analysis Some numerical examples are presented below, for which 
relative errors in cell areas (i.e. deviation from the optimality conditions) 
are given as well as a comparison between the tesselations obtained with our 
method and the true solution. Another way to test our method is as follows. 
Let us consider the set 



C :-- 



2(7) := 



M2xR2 



X2i/2rf7 , 7 e r(/i, u) 



C is a closed convex subset of and it is strictly convex in the sense that 
its boundary contains no line segment. Denoting by 7^ the solution of (13.11) . 
it is easy to check that z{e) := z{'y^) is an extreme point of C and that 
ezi + Z2 = ezi{e) + Z2{s) is the equation of a supporting line of C at z{e) and 
this supporting line intersects C only at z{e). If we consider the correlation 
curve e G (0, 1) ^ z{e) it can be represented as the graph of a concave 
decreasing function whose slope (when it exists) at point z{e) is —e. In 
our numerical test, we will also present graphs comparing the true concave 
correlation curve to the one computed by our method. 

We give three instances of executions of our algorithm, with samples 
of respectively 5, 10, and 15 points. Taking weights {e,e~^) with e G 
(0, +00) (rather than (e, 1) with e G (0, 1)) we get the full evolution of the 
optimal transports from one Knothe's transport (horizontal strips) to the 
other (vertical strips). Further examples as well as videos can be found at 
http://alfred.galichon.googlepages.com/aiiisotropic. It should also 
be pointed that our method does not approximate the solution of a single 
optimal transportation problem but a whole family of such problems (which 
actually explains relatively high running times). 

Five sample points. We take as our sample set a sample of five points. 
We get the following errors: 



7^ steps 


Relative errors in cell areas 


Time 


100 


-4.41% 


2.66% 


3.41% 


-2.46% 


0.80% 


66 s 


500 


-0.88% 


0.54% 


0.68% 


-0.49% 


0.16% 


349 s 
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for which we draw in Figure [T] the partition obtained for e = 1 using an exact 
method, as well as the true evolutions of the componentwise correlations of 
the x's and the y's from e = until e = 1. 

Running the continuation algorithm with 100 and 500 discretization steps 
we obtained partition sketched below, where the relative error on the cells 
area is inferior to 5%. The evolution of the componentwise correlations from 
e = until e = 1 are also sketched and compared with their exact conterparts 
(the above curve). Running the algorithm with 500 discretization steps on 
a standard laptop took 349 seconds, and yields to the results below, where 
the maximal relative error on the cells area is 1%. The evolution of the 
componentwise correlations from e = until e = 1 are also sketched and 
compared with their exact counterparts. 

Ten sample points. Taking a sample of ten points, we obtain the 
following errors: 



7^ steps 


Relative errors in cell areas 


Time 


2,000 


-1.37% 


-12% 


-2.44% 


-0.09% 


2.27% 


< 1 h 


3,000 


-0.92% 


-8.23% 


1.63% 


0.06% 


1.52% 


< 1,5 h 




(continued) 


2,000 


0.27% 


5.71% 


2.12% 


7.36% 


-1.58% 


< 1 h 


3,000 


0.18% 


3.84% 


1.42% 


4.94% 


-1.05% 


< 1,5 h 



We draw in Figure [3] the partition obtained for e = 1 using an exact 
method, as well as the true evolutions of the componentwise correlations of 
the x's and the y's from e = until e = 1. 

Fifteen sample points. With a sample of 15 points, we get the following 
errors: 



# steps 


Relative errors in cell areas 


Time 


3,000 


-2.24% 


-0.12% 


-0.58% 


0.20% 


-3.20% 


~ 2 h 


10,000 


-0.67% 


-0.04% 


-0.17% 


0.06% 


-0.97% 


< 4 h 




(continued) 


3,000 


0.17% 


-2.66% 


4.25% 


-2.43% 


28.85% 


~ 2 h 


10,000 


0.05% 


-0.82% 


1.28% 


-0.74% 


9.42% 


< 4 h 




(continued) 


3,000 


-30.11% 


2.51% 


1.82% 


2.92% 


0.63% 


~ 2 h 


10,000 


-9.79% 


0.76% 


0.55% 


0.89% 


0.19% 


< 4 h 



Acknowledgements G.C and F.S. gratefully acknowledge the support 
of the Agence Nationale de la Recherche via the research project OTARIE. 
A.G. gratefully acknowledge the support of Chaire EDF-Calyon "Finance et 



19 



Developpement Durable," of Chaire AXA "Assurance et risques majeurs," 
and of Chaire Societe Generale "Risques Financiers". The authors wish to 
warmly thank Yann Brenier and Alessio Figalli for stimulating discussions. 



References 

[1] A. Braides, T -convergence for beginners, Oxford University Press, Oxford, 
2002. 

[2] Brenier, Y., Polar factorization and monotone rearrangement of vector- 
valued functions. Communications on Pure and Applied Mathematics 44, 
375-417, 1991. 

[3] Crouzeix M., Mignot A.L, Analyse numerique des equations 
differentielles, Masson, Paris, 1984. 

[4] Knothe, H., Contributions to the theory of convex bodies, Michigan 
Mathematical Journal 4, pp. 39-52, 1957. 

[5] Rachev S.T., Riischendorf L., Mass Transportation Problems. Vol. L The- 
ory; Vol. II : Applications, Springer- Verlag 1998. 

[6] Rosenblatt, M., Remarks on a multivariate transformation. Annals of 
Mathematical Statistics 23, pp. 470-472, 1952. 

[7] Villani, C, Topics in Optimal Transportation, Providence: American 
Mathematical Society, 2003. 

[8] Villani, C, Optimal transport: Old and New, lecture notes, Ecole d'ete 
de probabilites de Saint-Flour, to appear. New York: Springer, 2008. 



20 




Figure 3: Ten sample points. Top row: exact algorithm (gradient method). 
Middle row: continuation algorithm, 2000 steps. Bottom row: continuation 
algorithm, 3000 steps. 
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Figure 4: Ten sample points: evolution of the tesselation for 5 = to 5 = +00 
(from top left to bottom right). 
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Figure 5: Fifteen sample points. Top row: exact algorithm (gradient 
method). Middle row: continuation algorithm, 3000 steps. Bottom row: 
continuation algorithm, 10,000 steps. 
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